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Abstract 

Complex computer codes, for instance simulating physical phenomena, 
are often too time expensive to be directly used to perform uncertainty, 
sensitivity, optimization and robustness analyses. A widely accepted method 
to circumvent this problem consists in replacing cpu time expensive computer 
models by cpu inexpensive mathematical functions, called metamodels. In 
this paper, we focus on the Gaussian process metamodel and two essential 
steps of its definition phase. First, the initial design of the computer code 
input variables (which allows to fit the metamodel) has to provide adequate 
space filling properties. We adopt a numerical approach to compare the 
performance of different types of space filling designs, in the class of the 
optimal Latin hypercube samples, in terms of the predictivity of the subsequent 
fitted metamodel. We conclude that such samples with minimal wrap-around 
discrepancy are particularly well-suited for the Gaussian process metamodel 
fitting. Second, the metamodel validation process consists in evaluating 
the metamodel predictivity with respect to the initial computer code. We 
propose and test an algorithm, which optimizes the distance between the 
validation points and the metamodel learning points in order to estimate 
the true metamodel predictivity with a minimum number of validation points. 
Comparisons with classical validation algorithms and application to a nuclear 
safety computer code show the relevance of this new sequential validation 
design. 

Keywords - Metamodel, Gaussian process, discrepancy, op- 
timal design, Latin hypercube sampling, computer experiment. 

1. Introduction 

With the advent of computing technology and numerical 
methods, investigation of computer code experiments remains 
an important challenge. Complex computer models calculate 
several output values (scalars or functions), which can depend 
on a high number of input parameters and physical variables. 
These computer models are used to make simulations as well 
as predictions, uncertainty analyses or sensitivity studies 0. 

However, complex computer codes are often too time ex- 
pensive to be directly used to conduct uncertainty propagation 
studies or global sensitivity analysis based on Monte Carlo 
methods. To avoid the problem of huge calculation time, it 
can be useful to replace the complex computer code by a 
mathematical approximation, called a metamodel ll29l , ifTSl . 
Several metamodels are classically used: polynomials, splines, 
generalized linear models, or learning statistical models like 
neural networks, regression trees, support vector machines 
15 1 . One particular class of metamodels, the Gaussian process 
(Gp) model, extends the kriging principles of geostatistics to 
computer experiments by considering the correlation between 
two responses of a computer code depending on the distance 



between input variables f29l. Numerous studies have shown 
that this interpolating model provides a powerful statistical 
framework to compute an efficient predictor of code response 

El, CI. 

From a practical standpoint, fitting a Gp model implies esti- 
mation of several hyperparameters involved in the covariance 
function. This optimization problem is particularly difficult in 
the case of a large number of inputs f5l, fT9 |. Several authors 
(for example [31] and L5J) have shown that the space filling 
designs are well suited to metamodel fitting. However, this 
class of design, which aims at obtaining the better coverage 
of the points in the space of the input variables, is particularly 
large, ranging from the well known Latin Hypercube Samples 
to low discrepancy sequences |5|. At the moment, no theoret- 
ical result gives the type of initial design, which leads to the 
best fitted Gp metamodel in terms of metamodel predictivity. 
In this work, we propose to give some numerical results in 
order to answer to this fundamental question. 

Another important issue we propose to address concerns the 
optimal choice of the test sample, i.e., the set of simulation 
design, which allows the most accurate metamodel validation 
using the minimal number of additional test observations. The 
validation of a metamodel is an essential step in practice 
|15|. By estimating the metamodel predictivity, we obtain a 
confidence degree associated with the use of the metamodel 
instead of the initial numerical model. Two validation methods 
are ordinarily used: the test sample approach [.11 J and the 
cross validation method ll23l . ll27ll . In this paper, we propose 
to perform numerical studies of the metamodel predictivity 
with respect to these validation methods. 

In the following section, we present the Gp model. In the 
third section, we present several criteria to optimize the choice 
of the initial input design. On two analytical examples, we 
evaluate the numerical performance of this optimal design in 
terms of Gp metamodel predictivity. In the fourth section, 
we look at the metamodel validation problem. Our solution 
consists in minimizing the number of test observations by 
using the recent algorithm of |6|, called the sequential val- 
idation design. We illustrate the relevance of this new design 
by performing intensive simulation on two analytical functions 
and an industrial example. Finally, a conclusion summarizes 
our results and gives some perspectives for this work. 



2. Gaussian process metamodeling 

Let us consider n realizations of a computer code. Each real- 
ization y{x) e R of the computer code output corresponds to 
a d-dimensional input vector x = (xi, . . . , Xd) G A', where X 
is a bounded domain of R^. The n points corresponding to the 
code runs are called the experimental design and are denoted 
as Xg = {x^^\ . . . ^x^^"^). The outputs will be denoted as 
Ys = {y^^\ . . . , y^'^^) with y^^^ = y{x^^^) Mi = l..n. Gaussian 
process (Gp) modeling treats the deterministic response y{x) 
as a realization of a random function Y{x), including a 
regression part and a centered stochastic process. This model 
can be written as: 

Y{x) = f{x)^Z{x). (1) 

The deterministic function f{x) provides the mean approxi- 
mation of the computer code. In our study, we use a one-degree 
polynomial model where f{x) can be written as follows: 

d 

where /3 = [/3o, . . . , is the regression parameter vector. 
It has been shown, for example in [21 1 and |19|, that such a 
function is sufficient, and sometimes necessary, to capture the 
global trend of the computer code. 

The stochastic part Z{x) is a Gaussian centered 
process fully characterized by its covariance function: 
Coy {Z (x), Z{u)) = a'^R{x^u)^ where denotes the vari- 
ance of Z and R is the correlation function that provides 
interpolation and spatial correlation properties. To simplify, 
a stationary process Z{x) is considered, which means that 
correlation between Z{x) and Z{u) is a function of the 
distance between x and u. Our study is focused on a particular 
family of correlation functions that can be written as a product 
of one-dimensional correlation functions Rf. 

d 

Cov(Z(x), Z{u)) = (T^R{x -u) = (T^Y[ ^li^i - ^0- 

1=1 

This form of correlation functions is particularly well adapted 
to get some simplifications of integrals in analytical uncer- 
tainty and sensitivity analyses f20 1 . More precisely, we choose 
to use the generalized exponential correlation function: 

d 

^0,p(^ - 1/) = ]^exp(-6>^|x^ - ui\P'), 
1=1 

where 6 = [Oi^ . . . ^OdY ^nd p = [pi^ - - - jPdY the correla- 
tion parameters (also called hyperparameters) with 6i > and 
< pi < 2 y I = l..d. This choice is motivated by the wide 
spectrum of shapes that such a function offers. 

If a new point = (x^, . . . , x^) G A' is considered, we 
obtain the predictor and variance formulas: 

E[Yop{x*)] = f{x*) + k{x*Y^:\Ys - f{Xs)) , (2) 
Var[FGp(a;*)] = - fe(a;*)*S;ifc(a;*) , (3) 



with Yqp denoting {Y\Ys,Xs,/3,a,d,p), 

k{x*) = [Cov(y(i),r(a;*)),...,Cov(y("),r(a;*))]* 
= a^[R0p{x('\x*), RQ^p{x("\x*))Y 

and the covariance matrix 

The conditional mean (Eq. Q) is used as a predictor. The 
variance formula (Eq. © corresponds to the mean squared 
error (MSE) of this predictor and is also known as the 
kriging variance. This analytical formula for MSE gives a local 
indicator of the prediction accuracy. More generally, Gp model 
provides an analytical formula for the distribution of the output 
variable at any arbitrary new point. This distribution formula 
can be used for sensitivity and uncertainty analysis ll2Ql . 

Regression and correlation parameters /3, a, 6 and p are 
ordinarily estimated by maximizing likelihood functions O. 
This optimization problem can be badly conditioned and 
difficult to solve in high dimensional cases (d > 5) |[T9l . 
Moreover, the estimation algorithms are particularly sensitive 
to the input design. The following section proposes to deal 
with this input design problem. 

3. Initial design for the metamodel fitting 

For computer experiments, selecting an experimental design 
is a key issue in building an efficient and informative meta- 
model. In this section, we describe the different properties than 
a computer experimental design has to reach. Some numerical 
tests support our discussion. 

3.1. Latin hypercube sampling 

Contrary to the Simple Random Sample (SRS, also called 
crude Monte Carlo sample), which consists of n independently 
and identically distributed samples, the well known Latin 
Hypercube Sample (LHS) consists in dividing the domain of 
each input variable in n equiprobable strata, and in sampling 
once from each stratum ll22ll . The LHS of a random vector 
X = (Xi, . . . , Xd), denoted {X^^\ . . . , X^^^), gives a sam- 
ple mean m = ^ Yl^=i the output Y = y{X) with a 
smaller variance than the sample mean of a SRS lf32l . Figure 
[T] shows 10 samples of two random variables, Xi and X2, 
obtained with SRS and LHS schemes. We can see that the 
result of LHS is more spread out and does not display the 
clustering effects found in SRS. 

However, LHS does not reach the smallest possible variance 
for the sample mean. Since it is only a form of stratified 
random sampling and it is not directly related to any cri- 
terion, it may also perform poorly in metamodel estimation 
and prediction of the model output. Therefore, some authors 
have proposed to enhance LHS not only to fill space in 
one dimensional projection, but also in higher dimensions 
|25|. One powerful idea is to adopt some optimality criterion 
applied to LHS, such as entropy, integrated mean square 



the wrap-around discrepancy 



(a) Simple Random Sampling 



(b) Latin Hypercube Sampling 



W\X^{n)): 



Fig. 1. Examples of two ways to generate a sample of 
size n = 10 from two variables X = [Xi,X2] where Xi 
has a uniform distribution ^/[0,1] and X2 has a normal 
distribution A/'(0, 1). Equprobable stratas are shown in 
each dimension. 



i,3 = l k = l 
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(5) 

which allows to suppress bound effects (by wrapping the 
unit cube for each coordinate). 

The optimization of LHS can be done following different 
methods: choice of the best (in terms of the chosen criteria) 
LHS amongst a large number of different LHS, column wise- 
pairwise exchange algorithms, genetic algorithms, simulated 
annealing, etc [12J, |17|. In our tests, we have found that the 
simulated annealing algorithm (with a geometrical temperature 
descent and with a slight noise on the initial condition) gives 
the best results for all the criteria L18J . Figure [2l gives some 
examples of two-dimensional LHS of size n = 16, optimized 
following three different criteria with the simulated annealing 
algorithm. We see that uniform repartitions of the points are 
nicely respected. 



error, minimax and maximin distances, etc. For instance, the 
maximin criterion consists in maximizing the minimal distance 
between the points |13|. This leads to avoid situations with 
too close points. The paper (241 examines some optimal 
maximin distance designs constructed within the class of 
Latin hypercube arrangements. The conceptual simplicity of 
these designs has led to their large popularity in practical 
applications O. 



3.2. Low-discrepancy Latin hypercube samples 

Alternative metamodel-independent criteria, based on dis- 
crepancy measures, consist in judging the uniformity quality 
of the design. Discrepancy can be seen as a measure between 
an initial configuration and an uniform one. It is a comparison 
between the volume of intervals and the number of points 
within these intervals |^. There exists different kinds of 
definition using different forms of intervals or different norms 
in the functional space. Discrepancy measures based on L2 
norms are the most popular in practice because they can be 
analytically expressed and are easy to compute. Among them, 
two measures have shown remarkable properties lfT2ll . El, ||5l: 

• the centered discrepancy 



Maximin 



Centered discrepancy 



Wrap-around discrepancy 

Fig. 2. Visual comparisons of LHS {d = 2, n = 16) 
optimized following three different criteria (below each 
figure). 
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where Xs{n) denotes the input learning sample with 
[u^^} are the nor- 

Xs{n) = 



n input vectors and l^^ 'l are 

V / i=l..n,/c=l..d 

malized values in [0, 1] of the design 



, X 



i=l..n,k=l..d 



3.3. Projection properties of space filling designs 

In addition to the space filling property on the sample space, 
one important property of the initial designs is their robustness 
to the dimension decrease. A LHS structure for the space 
filling design is not sufficient because it only guarantees good 
repartitions for one-dimensional projections, and not for the 
other dimensions of projection. Indeed, LHS ensures that each 
of the input variables has all proportion of its range which is 
represented (equiprobable stratas are created for each input 



variable). In contrary, no equiprobable stratas are created in 
the various multi-dimensional spaces of the input variables. 

We then argue that the sample points of a space filling 
design have to be well spread out when projected onto a 
subspace spanned by a subset of coordinate axes. This property 
is particularly important when the initial design is made in 
dimension d and the metamodel fitting is made in a smaller 
dimension (see an example in |1|). In practice, this is often 
the case because the initial design may reveal with screening 
methods the useless (i.e., non influent) input variables that we 
can neglect during the metamodel fitting step |26|. Moreover, 
when a selection of input variables is made during the meta- 
model fitting step (as for example in (191), the new sample, 
solely including the retained input variables, has to keep good 
space filling properties. 

Figure [3] compares the two-dimensional projections of the 
maximin LHS and low wrap-around discrepancy LHS (called 
WLHS) with n = 100 points and different initial dimensions 
(from d = 3 to 15). The reference criterion values are given 
for d = 2. For dimension larger than 2, we compute the 
new criterion values by considering all the two-dimensional 
projections of the initial design. A robust criterion to the 
dimension decrease would lead to a small increase of the 
criterion value. The criteria behave very diferently between 
the two types of design: 

• 2D projection criteria of WLHS regularly and slightly 
deteriorate. Then, 2D projections of WLHS made in 
dimensions close to 2 keep rather good space-filling 
properties. 

• 2D projection criteria of maximin LHS sharply and 
strongly deteriorate from the first dimension increase at 
d = 3. Then 2D projection criteria of maximin LHS 
remain stable at poor values for larger dimensions. 

Similar tests with different sample sizes n and for the three- 
dimensional and four-dimensional projections have led to the 
same conclusions. All these results show that a WLHS is the 
preferable initial design for fitting a computer code metamodel 
in high dimensional cases. 

3.4. Numerical studies on toy functions 

At present, we perform two numerical studies to evaluate 
the impact of an inadequate design on the metamodel fitting 
process. For the metamodel, we use the Gp model Fop 
described in §2. The quality of the metamodel predictor is 
measured by the so-called predictivity coefficient Q2 (i.e., 
the determination coefficient computed on a test sample), 
which gives the percentage of the output variance explained 
by the metamodel: 



Q2 = l 



(6) 



with {x^^\ . . . ^x^'^^'^) the test sample of size rit, Yop = 
E{Yq^) the Gp predictor (Eq. Q) and y the mean of the 
output test sample {y{x''^^), . . . , y{x''^^)). 
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Maximin LHS 




Low wrap-around discrepancy LHS (WLHS) 

Fig. 3. Criterion values (up: maximin, bottom: wrap- 
around discrepancy) obtained with 2D projections of de- 
signs coming from two types of LHS (containing n = 100 
points), with different dimensions: d = 2,3,4,5,10,15. 
Boxplots are obtained by repeating 100 optimizations 
using different initial LHS. 



3.4.1. A two-dimensional test case. Our first test involves 
a two-dimensional analytical function (called the irregular 
function): 

fix) = g^-^ + ^+44-4x^ + M+^4 ^ 

^ ^ 5 5 3 ^ ^10 ^4x^ + 4x^ + 1 

with X e [—1, 1]^. Figure 13] represents the irregular function. 

We have made several comparisons between random LHS 
and different space filling designs before fitting a metamodel 
ifTSl . In the following, we show our results concerning the 
random LHS and the WLHS, which has provided the best 
results. For a size n of the learning sample and each type 
of design, we repeat 100 times the following procedure: we 
generate an initial input design of n observations, we obtain n 
outputs with the toy function, we fit a Gp metamodel ([B, and 
we evaluate its predictivity coefficient Q2 using a test sample 
of large size (rit = 10000). Therefore, for each type of LHS, 
we obtain 100 values of Q2 whose mean and variance give 
us the efficiency and robustness of the design in terms of Gp 
quality. 

The initial LHS design optimized with the wrap-around 
discrepancy (Eq. (|5])) has given us the best results. In Figure 
[51 we compare the predictivity coefficients obtained with 
non optimized LHS (random LHS) and those obtained with 
optimized LHS (WLHS). The size of the design increases 
from n = 10 to n = 46 (by step of 4), which leads to a 




Fig. 4. Graphical representation of tlie irregular function 
on 



regular increase of Q2- For each size n, the boxplot represents 
the summary of the 100 values of Q2- In the all range of 
n, Q2 of the WLHS are better than the random LHS ones. 
Furthermore, much smaller variances (boxplots are smaller) 
are shown for WLHS and lead to the conclusion that these 
designs are more robust than others. This property is rather 
natural because there are much less variability between the 
100 different WLHS than between the 100 different random 
LHS (because of the optimization process). Differences are 
particularly important for sizes n = 30 and n = 34: the WHS 
lead to very competitive Gp metamodels {Q2 ~ 0.95 and 
boxplot width ~ 0.05) while random LHS give uncompleted 
metamodels {Q2 ^ 0.9 and boxplot width ~ 0.2). 





Fig. 5. For the irregular function, Gp Q2 evolution in 
function of the learning sample size n and for two types of 
LHS (left: random LHS; right: WLHS). 



with ai = 1, a2 = 2, as = 3, a4 = 4, as = 5, G [0, 1]^. 

We have made several comparisons between random LHS 
and different space filling designs before fitting a metamodel 
ifTSl . In the following, we show our results concerning the 
random LHS and the WLHS, which has provided the best 
results. For a size n of the learning sample and each type 
of design, we repeat 100 times the following procedure: we 
generate an initial input design of n observations, we obtain n 
outputs with the toy function, we fit a Gp metamodel ([B, and 
we evaluate its predictivity coefficient Q2 using a test sample 
of large size (rit = 10000). Therefore, for each type of LHS, 
we obtain 100 values of Q2 whose mean and variance give 
us the efficiency and robustness of the design in terms of Gp 
quality. 

As in the previous section, the initial LHS design optimized 
with the wrap-around discrepancy (Eq. (O) has given us 
the best results. In Figure [6l we compare the predictivity 
coefficients obtained with non optimized LHS (random LHS) 
and those obtained with optimized LHS (WLHS). The size of 
the design increases from n = 22 to n = 40 (by step of 2), 
which leads to a regular increase of Q2- For each size n, the 
boxplot represents the summary of the 100 values of Q2- In 
the all range of n, Q2 of the WLHS are better than the random 
LHS ones. Furthermore, much smaller variances are shown for 
WLHS and lead to the conclusion that these designs are more 
robust than others. For small sample sizes, the Q2 differences 
reach 0.2 between the two types of design: (52 (LHS) ~ 0.6 
and (52 (WLHS) ~ 0.8. In industrial applications, such a 
difference makes the distinction between "bad" (unacceptable) 
metamodels and good ones. The latter can be used for example 
for quantitative sensitivity studies. 





Fig. 6. For the g-Sobol 5d function, Gp Q2 evolution in 
function of the learning sample size n and for two types of 
LHS (left: random LHS; right: WLHS). 



3.5. Conclusion of numerical tests 



3.4.2. A five-dimensional test case. Our second test involves 
a five-dimensional analytical function (called the g-Sobol 5d 
function): 

^ |4xi-2|+ai 



/(^) = E- 



1 



In conclusion of our numerical study, the LHS optimized 
with the wrap-around discrepancy has provided efficient re- 
sults for the Gp metamodel fitting, even in high dimension. 
Furthermore, we have found that this design guarantees correct 
repartitions of the points for all the two-dimensional projec- 
tions, while other types of LHS (like maximin) have bad 



repartitions for these projections. Other types of LHS can also 
provide good results but less systematically (W] . For instance, 
I?] has studied quasi-Monte Carlo samples (Sobol suites and 
Halton sequences) and has shown that these sequences are less 
performant than other space filling designs in terms of the Gp 
metamodel fitting. 

Of course, such designs have to be seen as initial ones. If 
possible, in a second step, adaptive designs can improve meta- 
model predictivity in a very efficient way 1 18|, for instance by 
choosing new simulation points in poorly predicted areas. 

4. Test sample selection for metamodel valida- 
tion 

In practical cases, only a small number of simulations can be 
performed with the computer code in order to fit a metamodel. 
Once the metamodel has been built, estimating its predictivity 
is an important issue. Indeed, a safe use of this metamodel 
to answer to uncertainty or sensitivity problems requires a 
precise estimation of its capabilities. In this section, we make 
a discussion on algorithms of predictivity estimation. 

4.1. Classical validation methods 

Let us consider the d-dimensional input vector x = 
{xi^ . . . ^Xd) G A', where A' is a bounded domain 
of and y{x) G 1? is the computer code output. 
We suppose that a metamodel Y{x) has been fitted us- 
ing {^{x''^\y{x''^^))^ . . . ^ {x''^\y{x''^^))), a A^- size learning 
sample of computer code experiments. 

The test sample approach consists in comparing the meta- 
model predictions on simulation points not used in the meta- 
model fitting process. This gives some prediction residuals 
(which can be finely analyzed) and global quality measures as 
the metamodel predictivity coefficient Q2 (Eq. ©). Such test 
points set is called a test sample (or also validation sample 
or prediction sample). This method requires new calculations 
with the computer code and the first question we have to face 
up is the sufficient number of prediction points to obtain the 
required accuracy of our global validation measures. For cpu 
time expensive code, it can be difficult to provide a sufficient 
number of test points. Some convergence visualisation tools 
of the global validation measures can be used to answer to 
this first question. 

Another important question for the test sample approach is 
the localization of these test points. The usual practice is to 
choose an independent Monte Carlo sample for the test sam- 
ple. However, if the sample size is small, the proposed points 
can be badly localized, for example near learning points or 
leaving large space domain unsampled. A fine strategy could 
be to use, as the test sample, a space filling design (which 
consists in filling the input variable space X as uniformly 
as possible). Unfortunately, this solution does not avoid the 
possibility of too strong proximity between learning points and 
test points. Such proximity would lead to too optimistic quality 
measures, and consequently to a biased prectivity estimation. 



The second solution to validate a metamodel, the cross 
validation method, is extremely popular in practice because 
it avoids new calculations on the computer code. The cross 
validation method proposes to divide the initial sample on a 
learning sample and a test sample. A metamodel is estimated 
with the points in the new learning sample and prediction 
residuals are obtained via the new test sample. This process is 
repeated several times by using other divisions of the learning 
sample. Finally all the prediction residuals can be used to 
compute the global predictivity measures. The leave-one-out 
procedure is a particular case of the cross validation method 
where just one observation is left out at each step. 

The first drawback of the cross validation method is its cost, 
which can become large due to many metamodel fitting pro- 
cesses. Moreover, if the initial design has a specific geometric 
structure (which aims at optimizing the metamodel fitting), 
the deletion of points from the learning sample causes the 
breakdown of the specific design structure while creating the 
new learning sample. Indeed, the new learning sample does 
not have the adequate statistical and geometric properties of 
the initial design and the metamodel fitting process might fail. 
This could lead to too pessimistic quality measures. 

To sum up, the test sample method requires too many new 
prediction points (to avoid too optimistic validation criteria), 
while the cross-validation method can provide too pessimistic 
validation criteria. Therefore, to solve this dilemma, an heuris- 
tic new solution has been introduced in LIOJ . 1.9.1 and is 
presented in the next section. 

4.2. A new optimized validation design 

Retaining the test sample method, we limit its main draw- 
back by minimizing the number of necessary points in the 
test sample. In this goal, an algorithm allows the specification 
of new design points decreasing the discrepancy of an initial 
design |6|. This sequential algorithm gives us at each step 
the prediction point furthest away from the other points of 
the design. The algorithm performs its optimization process in 
the space X of the input variables x. By choosing the future 
prediction points in the unfilled zone of the learning sample 
design, we aim at capturing the right metamodel predictivity 
using only a small number of additional points. Note that such 
ideas have also been proposed in |28 | for different purposes. 

We have not theoretically studied the computational effi- 
ciency of this algorithm over the computational efficiency of 
the traditional methods (introduced in the previous section). 
However, our intuition is that mean square error computed by 
this algorithm avoids the biases, which could be caused by too 
strong proximities between the test sample points and between 
test sample points vs. learning sample points. 

Let us consider X/(n/) = (iCj^)i=i..n/ a low discrepancy 
sequence of n/ points in [0, 1]^. A low discrepancy sequence 
is a deterministic design constructed to uniformly fill the 
space with regular patterns. Among all the low discrepancy 
sequence, Halton, Hammersley, Faure and Sobol sequences 
are the most famous |16|. In the following, we will use the 



Hammersley sequence which, on a few tests, have shown better 
properties than the others |6|. The chosen discrepancy measure 
is the centered discrepancy D'^{-) (Eq. Q). 

To obtain an additional point of the initial A^-size sample, 
noticed Xs{N), we use the following algorithm: 

1) For z = 1, . . . , n/, 

• X,(Ar + l) = {a?(i),...,a?W}Ua?/(^); 

• compute BiU = D^{Xs{N + 1)) - D^{Xs{N)); 

2) select z* such that = argmin Dif^; 

1=1,. ..,nf 

3) obtain the new point iC/*^**^. 

This algorithm is repeated sequentially to obtain Attest test 
points, by updating the initial design and the low discrepancy 
sequence. For example, for the second point, we reinitalize the 
design by the following: Xs{N + 1) = {x^^\ . . . , x^^^ U 
x/'^^ and Xf{nf - 1) = {xf^^\ . . . ,Xf^''f^\xf^'*\ 

This algorithm just consists in adding to the initial design 
some points of a low discrepancy sequence by minimizing the 
discrepancy differences between the initial and the new design. 
The size of the low discrepancy sequence is required to be as 
large as possible, especially if d is large. Figure |7] gives an 
example of the specification with our algorithm of Attest = 4 
new points (the crosses) inside an initial Monte Carlo design 
(N = 46, d = 2). One of the advantage of this algorithm is its 
size-independence (related to the number of added points): the 
sequence of added points is deterministic and will be always 
the same for the same X/(n/). In the following, the design 
obtained using this algorithm is called the sequential validation 
design. 
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Fig. 7. Example of the sequential algorithm: N = 46, 
d = 2, A/test = 4. The bullets are the points of the initial 
design while the crosses are the new specified points. 



4.3. Numerical studies on toy functions 

4.3.1. A two-dimensional test case. To compare the sequen- 
tial validation design with other test designs for the metamodel 
validation purpose, we first perform an analytical test using a 
two-dimensional toy function, called the cosin2 function: 
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Fig. 8. Graphical representation of the cosin2 function on 

[0;1]2. 

Figure [S] represents the cosin2 function. 

Gp metamodels ([B are fitted using learning samples of 
differents sizes A^ba^ ^ba ranges from 10 to 40 allowing a 
wide variety of metamodel predictivity coefficients Q2, from 
(null predictivity) to 1 (perfect predictivity). The initial 10- 
size design is a maximin LHS. The other learning designs 
(of increased size) are obtained by sequentially adding points 
to the design, while maintaining the LHS properties of the 
design and keeping some optimality properties (maximizing 
the mean distance from each design point to all the other points 
in the design [T7|). Choosing an initial maximin LHS design, 
while we have shown in Section [3] that WLHS is better than 
maximin LHS for the Gp fitting process, is not in contradiction 
with our objectives in this section: our goal is now to study 
the Gp metamodel validation. Anyway, we are not able to 
keep the properties of maximin LHS or WLHS when gradually 
increasing the size of the learning sample. 

The black line in Figure [9] shows the evolution of Q2 in 
function of the learning sample size. This reference value 
for the predictivity coefficient has been computed for each 
metamodel by taking its mean over 100 test samples of 
size A/test = 1000. The Q2 estimation by a leave one out 
procedure (pink line) strongly underestimates the exact Q2 for 
A^BA < 30. This is certainly due to the small number of points: 
leave one out is pessimistic in this case because each point 
deletion has a strong impact on the metamodel fitting process. 
The red curve gives the Q2 estimation using the sequential 
validation design described in the previous paragraph (with a 
Hammersley sequence of size rif = 10000). 



f{x) = cos(lOxi) + sin(10x2) + ^1X2 , {xi^X2) G [0, 1]^. 
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Results are greatly satisfactory for Attest > 20: the sequential 
validation design gives precise Q2 estimates in all cases and 
outperforms a crude Monte Carlo or LHS design. The green 
curves correspond to the minimal and maximal values obtained 
with 100 repetitions using an optimized LHS as the test 
design. As expected, these intervals are more reduced than the 
intervals obtained using a crude Monte Carlo sample as the 
test design (blue curves). As TVtest increases, these intervals 
contract, but always show the superiority of the sequential 
validation design, especially for low metamodel predictivity 
(Q2 < 0.9 and N^^ < 25). 
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Fig. 9. For the cosin2 function, Gp predictivity coefficient 
{Q2) in function of the learning sample size A^ba, estimated 
from different test sample sizes Atest- The dashed curves 
(blue and green) give the minimal and maximal values 
obtained with 100 repetitions of the random test design 
(Monte Carlo and LHS). 



4.3.2. An eight-dimensional test case. We perform now a 
second numerical test using the g-Sobol function in eight- 
dimension (called the g-Sobol 8d function): 



1 



with ai = a2 = 3, tti = for (z = 3, . . . , 8), x G [0, 1]^ 

A Gp metamodel ([B is fitted on a learning sample (maximin 
LHS) of size A^ba = 40. We compute the reference value of 
the predictivity coefficient by taking its mean over 100 test 
samples of size Atest = 1000 and obtain =0.83. We then 
apply the sequential validation design described previously 
(with a Hammersley sequence of size n/ = 10000) by 
adding Atest = 50 new points to the design, and we obtain 
Qseq50 _ q which is closc to the true value. We compare 
this result with 100 crude Monte Carlo samples of the same 
size (Atest = 50), which give the 90% confidence interval 
[0.79, 0.91] for . This last result is rather large and shows 
the insufficient number of points if we choose a crude Monte 
Carlo design. 

Figure [TOl shows the evolution of the estimated Q2 for 
test bases with different sizes, ranging from Atest = 10 to 
^test = 50. The solid red line shows the results obtained 
with the sequential validation design while the dotted blue 
lines show the 100 sequentially increased crude Monte Carlo 
samples. This figure illustrates the poor estimates we obtain 
when using small size (Atest < 50) of Monte Carlo samples 
for validation. On the contrary, the sequential validation design 
allows to obtain a good approximation of the true predictivity 
coefficient even for small test sample sizes. Results are precise 
for Attest > 25. 

4.4. Application to a nuclear safety computer code 

In this section we apply our algorithms on a complex 
computer model used for nuclear reactor safety. It simulates a 
hypothetical thermal-hydraulic scenario on Pressurized Water 
Reactors: a large-break loss of primary coolant accident (see 
Fig. [TT]) for which the output of interest is the peak cladding 
temperature. This scenario is part of the Benchmark for Uncer- 
tainty Analysis in Best-Estimate Modelling for Design, Opera- 
tion and Safety Analysis of Light Water Reactors [2J proposed 
by the Nuclear Energy Agency of the Organisation for Eco- 
nomic Co-operation and Development (OCDE/NEA). It has 
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Fig. 10. For the g-Sobol 8d function, Gp predictivity 
coefficient {Q2) in function of the test sample size A/test, 
for two types of validation design: sequential (red) and 
crude Monte Carlo (blue). Dotted blue lines correspond to 
100 different crude Monte Carlo samples). 

been implemented on the french computer code CATHARE2 
developed at the Commissariat a I'Energie Atomique (CEA). 
Figure fT2l illustrates 100 CATHARE2 simulations (by varying 
input variables of the accidental scenario) giving the cladding 
temperature in function of time. 




Fig. 11. Illustration of a large-break loss of primary 
coolant accident on a nuclear Pressurized Water Reactor. 

In our exercise, a Gp metamodel ([B of the first peak 
cladding temperature (which is a scalar variable) has to be 
estimated with A/^ = 100 simulations of the computer model 
(the input design is a maximin LHS). The cpu time is 
twenty minutes for each simulation with a standard computer 
(Pentium IV PC). The complexity of the computer model 
lies in the high-dimensional input space, d = 53 random 




time(s) 



Fig. 12. 100 output curves (cladding temperatures in 
function of time) of the CATHARE2 code. The output 
variable of interest for the reactor safety is the first peak 
of the cladding temperature. 

input variables are considered: physical laws essentially, but 
also initial conditions, material properties and geometrical 
modeling. Their probability distributions are either normal or 
log-normal, and both are truncated. Such a number of input 
variables is rather large for the metamodel fitting problem. 
This difficult fit (due to the high dimensionality and small 
learning sample size) can be made thanks to the algorithm of 
O, specifically devoted to this situation. The obtained Gp 
metamodel ([B contains a linear regression part (including 7 
input variables) and a centered Gp model with a generalized 
exponential covariance function (including 6 input variables). 
The reference quality of this Gp model is measured via an 
additional 1000-size test sample, which gives — 0-66. 

Figure [13] shows the evolution of the estimated Q2 for 
test bases with different sizes, ranging from A/test = 10 to 
Atest = 95. The sequential validation design gives coarse esti- 
mations for all the test design sizes and begins to give precise 
results for Attest ^ 40. Some inadequacies, which remain when 
^test ^ [75, 90], have to be finely analyzed in a further work. In 
any cases, sequential validation design estimations are clearly 
less hazardous than using a crude Monte Carlo test sample to 
validate the metamodel: the 90%-confidence intervals obtained 
using Monte carlo samples show extremely large variation 
ranges (because of the high dimensionality of the input space: 
d = b3). Q2 estimation using a Monte Carlo test sample can 
lead to a strongly erroneous result. Same results have been 
obtained using optimized LHS for the test design instead of a 
crude Monte Carlo sample. 

5. Conclusion and future works 

In this paper, we have proposed to look at two practical 
problems when fitting a metamodel to small- size data samples: 
the initial design and the validation method choices. These 
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Fig. 1 3. For the nuclear safety computer code application, 
estimation of the metamodel (Gp) predictivity coefficient 
(Q2) in function of the test sample size A/test, for two types 
of validation design: sequential (red) and crude Monte 
Carlo (blue). 



problems are relevant when a cpu time expensive code has 
to be replaced by a simplified model with negligible cost 
(I;e., a metamodel). Such replacement is useful to resolve 
optimisation, uncertainty propagation or sensitivity analysis 
issues. 

We have first paid attention to the initial input design. Our 
numerical tests concentrate on the popular Gp metamodel 
and on the LHS. This type of design, developed thirty years 
ago, is the most widely used in industrial applications. We 
have shown that an excellent way to optimize its properties, 
in the objective of the best metamodel fit, is to minimize 
some discrepancy measures, especially the wrap-around 
discrepancy. An alternative strategy, if possible, would be to 
use some adaptive designs |18|. For the Gp metamodel, this 
kind of design is well-adapted due to the availability of the 
variance expression (the MSE of the metamodel predictor, see 
Eq. ©). 

Secondly, we have looked at the metamodel validation 
process and have shown that the test sample approach can 
provide erroneous results for small sizes of the test sample. 
Moreover, the leave one out approach can strongly underesti- 
mate the metamodel predictivity for small sizes of the whole 
database. We have proposed to use a recent algorithm, called 
the sequential validation design, which puts prediction points 
in the unfilled zones of the learning sample design. Therefore, 
a minimal number of points is required to obtain a good 
estimation of the metamodel predictivity. Our numerical tests 
on analytical functions and real application cases have shown 
that the sequential validation design outperforms the classical 
metamodel validation methods, especially in high dimensional 
context. For our analytical functions, the sequential validation 
design gives precise estimate of the metamodel predictivity 



with a test sample size A/test > 25, while for our industrial 
application, the minimal bound is A/test ^ 40. 

Further works are necessary to more deeply study the 
validation designs (other test functions with different effective 
dimensionality and complexity). Moreover, it would be useful 
to find a criterion to determine when the sequential validation 
design can be ended. Finally, the ultimate goal of such studies 
will be to define a global strategy of allocating simulation 
points between the metamodel fitting step and the metamodel 
validation step. 
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